94
8.1
When Does It Become a Challenge for the Computer?
However, the moment we recognise that this may be the unavoidable limitation of our
computational approach and that we naturally systematically do not take these imponder
able, qualitative aspects into account in our bioinformatic models, we are already a signifi
cant step further. So let’s keep in mind: bioinformatics tries to describe biological reality
in clear, transparent models, for example how a normal cell becomes a cancer cell. By
using the computer and experimental data, I make myself blind in terms of experience and
other direct interactions with nature, but I have the undeniable advantage of having quan
titative statements about the biological process through numbers and measures (“give
numbers to the arrows” is what Leroy Hood once called it). This alone, through these
quantitative glasses, prevents being drowned in too much unprovable theory. For example,
the model predicts that 80% of cancer cells will die from treatment. We can simply mea
sure in experiments how far this is true. This also brings up another important implication
for all bioinformatics analyses. For example, if we have found a related sequence to a
sequence of which we know more about the function by sequence comparison, then we
should continue this chain of reasoning (from sequence comparison to sequence compari
son) until we have a clear experiment on the last sequence that biochemically or molecu
larly confirms the function of the protein associated with the sequence. Only then do we
have solid ground within the framework of our model.
So much for the bioinformatic model, which should therefore always base its own cal
culations on solid, experimental data. Now a few sentences about the calculations: After
all, it could be that these calculations take a very long time, and anyone who has ever
“kicked” his computer with such a complicated, lengthy calculation knows the problem of
wondering, “When will this limited computational box finally stop calculating?” The
problems where this is unresolved are called NP problems (NP stands for nondeterministic
polynomial time). There is no simple formula (a polynomial) that allows one to calculate
how long the computer will compute based on the length of the input. Unfortunately, most
biologically exciting problems are such NP problems. This is because biomolecules and
all higher processes in the cell are usually modular, made up of similar or identical units
(see Part 1). Thus, the addition of only one further unit leads to a multiple increase in
computation time, and such combinatorial problems (“combinatorial explosion”) there
fore almost always occur in our biological modelling. This leads to corresponding uncer
tainties in the computation time. However, one can help oneself with fixed specifications,
so-called “stopping criteria“, i.e. stopping specifications for the computer, e.g. “please
stop after one hour of computing time”. But more important is the fact that with a fixed
calculation time it is not possible to estimate how good the solution found up to that point
is in comparison to the best or optimal solution. But that’s just life: Not so easy to grasp!
2014
8.1
8 When Does the Computer Stop Calculating?